Overview:

This page contains the results of CoNGA analyses. Results in tables may have been filtered to reduce redundancy, focus on the most important columns, and limit length; full tables should exist as OUTFILE_PREFIX*.tsv files.

Command:

scripts/run_conga.py --graph_vs_graph --graph_vs_features --find_hotspot_features --tcr_clumping --match_to_tcr_database --gex_data filtered_feature_bc_matrix.h5 --gex_data_type 10x_h5 --clones_file tmp_rhesus_clones.tsv --organism human --outfile_prefix tmp_rhesus

Stats

num_cells_w_gex: 15491
num_features_start: 26530
num_cells_w_tcr: 1418
min_genes_per_cell: 200
max_genes_per_cell: 3500
max_percent_mito: 0.1
num_filt_max_genes_per_cell: 91
num_filt_max_percent_mito: 0
num_antibody_features: 0
num_TR_genes: 43
num_TR_genes_in_hvg_set: 36
num_highly_variable_genes: 2088
num_cells_after_filtering: 1327
num_clonotypes: 1022
max_clonotype_size: 64
num_singleton_clonotypes: 942
nbr_frac_for_nndists: 0.01
num_gvg_hit_clonotypes: 78
num_gvg_hit_biclusters: 5

graph_vs_graph


Graph vs graph analysis looks for correlation between GEX and TCR space by finding statistically significant overlap between two similarity graphs, one defined by GEX similarity and one by TCR sequence similarity.

Overlap is defined one node (clonotype) at a time by looking for overlap between that node's neighbors in the GEX graph and its neighbors in the TCR graph. The null model is that the two neighbor sets are chosen independently at random.

CoNGA looks at two kinds of graphs: K nearest neighbor (KNN) graphs, where K = neighborhood size is specified as a fraction of the number of clonotypes (defaults for K are 0.01 and 0.1), and cluster graphs, where each clonotype is connected to all the other clonotypes in the same (GEX or TCR) cluster. Overlaps are computed 3 ways (GEX KNN vs TCR KNN, GEX KNN vs TCR cluster, and GEX cluster vs TCR KNN), for each of the K values (called nbr_fracs short for neighbor fractions).

Columns (depend slightly on whether hit is KNN v KNN or KNN v cluster): conga_score = P value for GEX/TCR overlap * number of clonotypes mait_fraction = fraction of the overlap made up of 'invariant' T cells num_neighbors* = size of neighborhood (K) cluster_size = size of cluster (for KNN v cluster graph overlaps) clone_index = 0-index of clonotype in adata object


conga_score num_neighbors_gex num_neighbors_tcr overlap overlap_corrected mait_fraction clone_index nbr_frac graph_overlap_type cluster_size gex_cluster tcr_cluster va ja cdr3a vb jb cdr3b
1.362550e-08 NaN 10.0 10 9 1.000000 27 0.01 gex_cluster_vs_tcr_nbr 68.0 4 8 TRAV1-2*01 TRAJ33*01 CAVMDSNYQLIW TRBV4-3*01 TRBJ2-6*01 CASSQGPGGASVLTF
1.261473e-07 NaN 10.0 10 8 1.000000 35 0.01 gex_cluster_vs_tcr_nbr 64.0 4 8 TRAV1-2*01 TRAJ33*01 CAVRDSNYQLIW TRBV24-1*01 TRBJ2-6*01 CATSSDSSGASVLTF
1.261473e-07 NaN 10.0 10 8 1.000000 33 0.01 gex_cluster_vs_tcr_nbr 64.0 4 8 TRAV1-2*01 TRAJ33*01 CAVRDSNYQLIW TRBV10-2*01 TRBJ2-6*01 CASSVDSDSGASVLTF
2.060094e-07 NaN 10.0 10 8 1.000000 31 0.01 gex_cluster_vs_tcr_nbr 68.0 4 8 TRAV1-2*01 TRAJ33*01 CAVRDQNYQLIW TRBV4-2*01 TRBJ2-6*01 CASSQDQEGTGASVLTF
1.572245e-06 NaN 10.0 9 8 1.000000 28 0.01 gex_cluster_vs_tcr_nbr 66.0 4 8 TRAV1-2*01 TRAJ33*01 CAVMDSNYQLIW TRBV4-2*01 TRBJ2-6*01 CASSQVPGTGASVLTF
1.996482e-06 NaN 10.0 9 8 1.000000 22 0.01 gex_cluster_vs_tcr_nbr 68.0 4 8 TRAV1-2*01 TRAJ33*01 CAALDSNYQLIW TRBV4-3*01 TRBJ1-2*01 CASSQGGAPDYDYTF
1.996482e-06 NaN 10.0 9 8 1.000000 41 0.01 gex_cluster_vs_tcr_nbr 68.0 4 8 TRAV1-2*01 TRAJ33*01 CAVWDSNYQLIW TRBV4-3*01 TRBJ2-2*01 CASSQDWGEPGAQLFF
6.715313e-06 NaN 10.0 8 8 1.000000 36 0.01 gex_cluster_vs_tcr_nbr 64.0 4 8 TRAV1-2*01 TRAJ33*01 CAVRDSNYQLIW TRBV6-1*01 TRBJ2-2*01 CASSVRGETAQLFF
1.763080e-05 NaN 10.0 9 7 1.000000 38 0.01 gex_cluster_vs_tcr_nbr 64.0 4 8 TRAV1-2*01 TRAJ33*01 CAVRDSNYQLIW TRBV6-2*01 TRBJ2-6*01 CASSEAASGASVLTF
2.688248e-05 NaN 10.0 9 7 1.000000 23 0.01 gex_cluster_vs_tcr_nbr 68.0 4 8 TRAV1-2*01 TRAJ33*01 CAAMDSNYQLIW TRBV4-2*01 TRBJ2-2*01 CASSQAMGEHGAQLFF
2.688248e-05 NaN 10.0 9 7 1.000000 32 0.01 gex_cluster_vs_tcr_nbr 68.0 4 8 TRAV1-2*01 TRAJ33*01 CAVRDRDYQLIW TRBV19*01 TRBJ2-6*01 CASSSGNSGASVLTF
6.491113e-05 NaN 102.0 23 21 0.652174 34 0.10 gex_cluster_vs_tcr_nbr 64.0 4 8 TRAV1-2*01 TRAJ33*01 CAVRDSNYQLIW TRBV10-3*01 TRBJ1-5*01 CASSEGGEVNQPQYF
6.491113e-05 NaN 102.0 23 21 0.652174 38 0.10 gex_cluster_vs_tcr_nbr 64.0 4 8 TRAV1-2*01 TRAJ33*01 CAVRDSNYQLIW TRBV6-2*01 TRBJ2-6*01 CASSEAASGASVLTF
1.042252e-04 NaN 10.0 8 7 1.000000 29 0.01 gex_cluster_vs_tcr_nbr 66.0 4 8 TRAV1-2*01 TRAJ33*01 CAVMDSNYQLIW TRBV4-3*01 TRBJ2-6*01 CASSQDIGGSSGASVLTF
1.599297e-04 NaN 102.0 26 20 0.730769 32 0.10 gex_cluster_vs_tcr_nbr 68.0 4 8 TRAV1-2*01 TRAJ33*01 CAVRDRDYQLIW TRBV19*01 TRBJ2-6*01 CASSSGNSGASVLTF
3.012966e-04 NaN 10.0 7 7 1.000000 34 0.01 gex_cluster_vs_tcr_nbr 64.0 4 8 TRAV1-2*01 TRAJ33*01 CAVRDSNYQLIW TRBV10-3*01 TRBJ1-5*01 CASSEGGEVNQPQYF
3.019490e-04 NaN 10.0 9 6 1.000000 30 0.01 gex_cluster_vs_tcr_nbr 66.0 4 8 TRAV1-2*01 TRAJ33*01 CAVMDSNYQLIW TRBV6-3*01 TRBJ2-6*01 CASNMRHSGASVLTF
3.611443e-04 NaN 102.0 22 20 0.681818 33 0.10 gex_cluster_vs_tcr_nbr 64.0 4 8 TRAV1-2*01 TRAJ33*01 CAVRDSNYQLIW TRBV10-2*01 TRBJ2-6*01 CASSVDSDSGASVLTF
3.621572e-04 NaN 10.0 9 6 1.000000 26 0.01 gex_cluster_vs_tcr_nbr 68.0 4 8 TRAV1-2*01 TRAJ33*01 CAPRDSNYQLIW TRBV6-1*01 TRBJ2-6*01 CASSEGYSGASVLTF
3.621572e-04 NaN 10.0 9 6 1.000000 25 0.01 gex_cluster_vs_tcr_nbr 68.0 4 8 TRAV1-2*01 TRAJ33*01 CAPMDSNYQLIW TRBV10-1*01 TRBJ2-6*01 CASSWDNSGASVLTF
3.621572e-04 NaN 10.0 9 6 1.000000 39 0.01 gex_cluster_vs_tcr_nbr 68.0 4 8 TRAV1-2*01 TRAJ33*01 CAVRDSNYQLIW TRBV10-1*01 TRBJ2-6*01 CASSDGGESGASVLTF
6.274256e-04 NaN 102.0 38 38 0.000000 567 0.10 gex_cluster_vs_tcr_nbr 182.0 2 6 TRAV27*01 TRAJ53*01 CAGAYSGSSNYKLTF TRBV20-1*01 TRBJ2-2*01 CSARRRTNTAQLFF
8.686281e-04 NaN 102.0 25 19 0.760000 39 0.10 gex_cluster_vs_tcr_nbr 68.0 4 8 TRAV1-2*01 TRAJ33*01 CAVRDSNYQLIW TRBV10-1*01 TRBJ2-6*01 CASSDGGESGASVLTF
8.686281e-04 NaN 102.0 25 19 0.760000 25 0.10 gex_cluster_vs_tcr_nbr 68.0 4 8 TRAV1-2*01 TRAJ33*01 CAPMDSNYQLIW TRBV10-1*01 TRBJ2-6*01 CASSWDNSGASVLTF
8.686281e-04 NaN 102.0 25 19 0.760000 27 0.10 gex_cluster_vs_tcr_nbr 68.0 4 8 TRAV1-2*01 TRAJ33*01 CAVMDSNYQLIW TRBV4-3*01 TRBJ2-6*01 CASSQGPGGASVLTF
1.163187e-03 10.0 NaN 6 6 1.000000 36 0.01 gex_nbr_vs_tcr_cluster 46.0 4 8 TRAV1-2*01 TRAJ33*01 CAVRDSNYQLIW TRBV6-1*01 TRBJ2-2*01 CASSVRGETAQLFF
1.847839e-03 NaN 102.0 21 19 0.714286 35 0.10 gex_cluster_vs_tcr_nbr 64.0 4 8 TRAV1-2*01 TRAJ33*01 CAVRDSNYQLIW TRBV24-1*01 TRBJ2-6*01 CATSSDSSGASVLTF
1.980417e-03 102.0 NaN 26 26 0.000000 634 0.10 gex_nbr_vs_tcr_cluster 104.0 5 3 TRAV38-1*01 TRAJ43*01 CAFMKENNDIRF TRBV9*01 TRBJ1-3*01 CASSLGQESGNTVYF
4.247237e-03 102.0 NaN 20 20 0.000000 218 0.10 gex_nbr_vs_tcr_cluster 70.0 2 6 TRAV13-1*01 TRAJ47*01 CAAIFYGNKLIF TRBV20-1*01 TRBJ1-3*01 CSALNGGSGNTVYF
4.329108e-03 NaN 102.0 24 18 0.791667 18 0.10 gex_cluster_vs_tcr_nbr 68.0 4 8 TRAV1-2*01 TRAJ20*01 CAVRDRDYKLSF TRBV4-3*01 TRBJ2-4*01 CASSQDLGGSDTQYF
4.329108e-03 NaN 102.0 24 18 0.791667 26 0.10 gex_cluster_vs_tcr_nbr 68.0 4 8 TRAV1-2*01 TRAJ33*01 CAPRDSNYQLIW TRBV6-1*01 TRBJ2-6*01 CASSEGYSGASVLTF
4.552048e-03 102.0 NaN 21 15 0.904762 24 0.10 gex_nbr_vs_tcr_cluster 51.0 4 8 TRAV1-2*01 TRAJ33*01 CAFMDSNYQLIW TRBV6-1*01 TRBJ2-3*01 CASSGTGDTDPQYF
4.715481e-03 NaN 10.0 7 6 1.000000 18 0.01 gex_cluster_vs_tcr_nbr 68.0 4 8 TRAV1-2*01 TRAJ20*01 CAVRDRDYKLSF TRBV4-3*01 TRBJ2-4*01 CASSQDLGGSDTQYF
6.117287e-03 NaN 102.0 22 18 0.772727 30 0.10 gex_cluster_vs_tcr_nbr 66.0 4 8 TRAV1-2*01 TRAJ33*01 CAVMDSNYQLIW TRBV6-3*01 TRBJ2-6*01 CASNMRHSGASVLTF
6.117287e-03 NaN 102.0 22 18 0.772727 29 0.10 gex_cluster_vs_tcr_nbr 66.0 4 8 TRAV1-2*01 TRAJ33*01 CAVMDSNYQLIW TRBV4-3*01 TRBJ2-6*01 CASSQDIGGSSGASVLTF
8.682594e-03 NaN 102.0 20 18 0.750000 36 0.10 gex_cluster_vs_tcr_nbr 64.0 4 8 TRAV1-2*01 TRAJ33*01 CAVRDSNYQLIW TRBV6-1*01 TRBJ2-2*01 CASSVRGETAQLFF
1.067573e-02 10.0 NaN 7 5 0.857143 27 0.01 gex_nbr_vs_tcr_cluster 51.0 4 8 TRAV1-2*01 TRAJ33*01 CAVMDSNYQLIW TRBV4-3*01 TRBJ2-6*01 CASSQGPGGASVLTF
1.122430e-02 102.0 NaN 32 32 0.000000 906 0.10 gex_nbr_vs_tcr_cluster 156.0 0 1 TRAV8-3*01 TRAJ8*01 CAVSERNTGFQKLVF TRBV23-1*01 TRBJ2-7*01 CASSPQGEYEQYF
1.976548e-02 NaN 102.0 23 17 0.826087 41 0.10 gex_cluster_vs_tcr_nbr 68.0 4 8 TRAV1-2*01 TRAJ33*01 CAVWDSNYQLIW TRBV4-3*01 TRBJ2-2*01 CASSQDWGEPGAQLFF
1.976548e-02 NaN 102.0 23 17 0.826087 23 0.10 gex_cluster_vs_tcr_nbr 68.0 4 8 TRAV1-2*01 TRAJ33*01 CAAMDSNYQLIW TRBV4-2*01 TRBJ2-2*01 CASSQAMGEHGAQLFF
1.976548e-02 NaN 102.0 23 17 0.826087 31 0.10 gex_cluster_vs_tcr_nbr 68.0 4 8 TRAV1-2*01 TRAJ33*01 CAVRDQNYQLIW TRBV4-2*01 TRBJ2-6*01 CASSQDQEGTGASVLTF
1.976548e-02 NaN 102.0 23 17 0.826087 40 0.10 gex_cluster_vs_tcr_nbr 68.0 4 8 TRAV1-2*01 TRAJ33*01 CAVTDSNYQLIW TRBV6-1*01 TRBJ1-2*01 CASSDWDSNYDYTF
2.563956e-02 10.0 NaN 6 5 1.000000 23 0.01 gex_nbr_vs_tcr_cluster 51.0 4 8 TRAV1-2*01 TRAJ33*01 CAAMDSNYQLIW TRBV4-2*01 TRBJ2-2*01 CASSQAMGEHGAQLFF
2.713711e-02 NaN 102.0 21 17 0.809524 28 0.10 gex_cluster_vs_tcr_nbr 66.0 4 8 TRAV1-2*01 TRAJ33*01 CAVMDSNYQLIW TRBV4-2*01 TRBJ2-6*01 CASSQVPGTGASVLTF
3.553112e-02 NaN 102.0 48 48 0.000000 825 0.10 gex_cluster_vs_tcr_nbr 296.0 0 1 TRAV8-2*01 TRAJ29*01 CAVNVSGNRALVF TRBV9*01 TRBJ2-1*01 CASSYRGWGDNEQFF
4.789739e-02 NaN 10.0 7 5 1.000000 40 0.01 gex_cluster_vs_tcr_nbr 68.0 4 8 TRAV1-2*01 TRAJ33*01 CAVTDSNYQLIW TRBV6-1*01 TRBJ1-2*01 CASSDWDSNYDYTF
4.789739e-02 NaN 10.0 7 5 1.000000 24 0.01 gex_cluster_vs_tcr_nbr 68.0 4 8 TRAV1-2*01 TRAJ33*01 CAFMDSNYQLIW TRBV6-1*01 TRBJ2-3*01 CASSGTGDTDPQYF
5.452475e-02 10.0 NaN 5 5 1.000000 24 0.01 gex_nbr_vs_tcr_cluster 51.0 4 8 TRAV1-2*01 TRAJ33*01 CAFMDSNYQLIW TRBV6-1*01 TRBJ2-3*01 CASSGTGDTDPQYF
5.815087e-02 102.0 102.0 23 23 0.000000 368 0.10 gex_nbr_vs_tcr_nbr NaN 6 3 TRAV19*01 TRAJ43*01 CALIQYNNNDIRF TRBV6-1*01 TRBJ2-3*01 CASSDFGLGQGYPQYF
6.007283e-02 NaN 102.0 20 17 0.700000 17 0.10 gex_cluster_vs_tcr_nbr 68.0 4 8 TRAV1-2*01 TRAJ12*01 CAVRDPGDGGYKLIF TRBV7-2*01 TRBJ2-7*01 CASSPSWSGSYAEQYF
Omitted 70 lines

graph_vs_graph_logos


This figure summarizes the results of a CoNGA analysis that produces scores (CoNGA) and clusters. At the top are six 2D UMAP projections of clonotypes in the dataset based on GEX similarity (top left three panels) and TCR similarity (top right three panels), colored from left to right by GEX cluster assignment; CoNGA score; joint GEX:TCR cluster assignment for clonotypes with significant CoNGA scores, using a bicolored disk whose left half indicates GEX cluster and whose right half indicates TCR cluster; TCR cluster; CoNGA; GEX:TCR cluster assignments for CoNGA hits, as in the third panel.

Below are two rows of GEX landscape plots colored by (first row, left) expression of selected marker genes, (second row, left) Z-score normalized and GEX-neighborhood averaged expression of the same marker genes, and (both rows, right) TCR sequence features (see CoNGA manuscript Table S3 for TCR feature descriptions).

GEX and TCR sequence features of CoNGA hits in clusters with 5 or more hits are summarized by a series of logo-style visualizations, from left to right: differentially expressed genes (DEGs); TCR sequence logos showing the V and J gene usage and CDR3 sequences for the TCR alpha and beta chains; biased TCR sequence scores, with red indicating elevated scores and blue indicating decreased scores relative to the rest of the dataset (see CoNGA manuscript Table S3 for score definitions); GEX 'logos' for each cluster consisting of a panel of marker genes shown with red disks colored by mean expression and sized according to the fraction of cells expressing the gene (gene names are given above).

DEG and TCRseq sequence logos are scaled by the adjusted P value of the associations, with full logo height requiring a top adjusted P value below 10-6. DEGs with fold-change less than 2 are shown in gray. Each cluster is indicated by a bicolored disk colored according to GEX cluster (left half) and TCR cluster (right half). The two numbers above each disk show the number of hits within the cluster (on the left) and the total number of cells in those clonotypes (on the right). The dendrogram at the left shows similarity relationships among the clusters based on connections in the GEX and TCR neighbor graphs.

The choice of which marker genes to use for the GEX umap panels and for the cluster GEX logos can be configured using run_conga.py command line flags or arguments to the conga.plotting.make_logo_plots function.
Image source: tmp_rhesus_graph_vs_graph_logos.png

tcr_clumping


This table stores the results of the TCR "clumping" analysis, which looks for neighborhoods in TCR space with more TCRs than expected by chance under a simple null model of VDJ rearrangement.

For each TCR in the dataset, we count how many TCRs are within a set of fixed TCRdist radii (defaults: 24,48,72,96), and compare that number to the expected number given the size of the dataset using the poisson model. Inspired by the ALICE and TCRnet methods.

Columns: clump_type='global' unless we are optionally looking for TCR clumps within the individual GEX clusters num_nbrs = neighborhood size (number of other TCRs with TCRdist

clump_type clone_index nbr_radius pvalue_adj num_nbrs expected_num_nbrs raw_count va ja cdr3a vb jb cdr3b clonotype_fdr_value clumping_group clusters_gex clusters_tcr
global 25 96 2.966006e-13 7 0.016757 41032.0 TRAV1-2*01 TRAJ33*01 CAPMDSNYQLIW TRBV10-1*01 TRBJ2-6*01 CASSWDNSGASVLTF 2.966006e-13 1 4 8
global 39 96 7.272156e-13 7 0.019053 46654.0 TRAV1-2*01 TRAJ33*01 CAVRDSNYQLIW TRBV10-1*01 TRBJ2-6*01 CASSDGGESGASVLTF 3.636078e-13 1 4 8
global 26 96 1.311137e-12 9 0.079370 194344.0 TRAV1-2*01 TRAJ33*01 CAPRDSNYQLIW TRBV6-1*01 TRBJ2-6*01 CASSEGYSGASVLTF 4.370455e-13 1 4 8
global 697 24 4.369259e-12 4 0.000400 980.0 TRAV4*01 TRAJ33*01 CLVGDSNYQLIW TRBV10-1*01 TRBJ2-1*01 CAIAGTNYGEQFF 1.092315e-12 2 3 5
global 692 24 1.352794e-11 4 0.000531 1300.0 TRAV4*01 TRAJ33*01 CLGGDSNYQLIW TRBV10-1*01 TRBJ2-1*01 CASSGTNYGEQFF 2.254656e-12 2 3 5
global 696 24 1.352794e-11 4 0.000531 1300.0 TRAV4*01 TRAJ33*01 CLVGDSNYQLIW TRBV10-1*01 TRBJ2-1*01 CASSGTNYGEQFF 2.254656e-12 2 3 5
global 32 96 2.862912e-09 5 0.009674 23687.0 TRAV1-2*01 TRAJ33*01 CAVRDRDYQLIW TRBV19*01 TRBJ2-6*01 CASSSGNSGASVLTF 4.089874e-10 1 4 8
global 698 24 1.332229e-08 3 0.000269 661.0 TRAV4*01 TRAJ33*01 CLVGDSNYQLIW TRBV10-1*01 TRBJ2-1*01 CASSATNYGEQFF 1.665286e-09 2 1 5
global 697 48 1.965991e-08 4 0.003280 8031.0 TRAV4*01 TRAJ33*01 CLVGDSNYQLIW TRBV10-1*01 TRBJ2-1*01 CAIAGTNYGEQFF 1.092315e-12 2 3 5
global 696 48 6.778152e-08 4 0.004470 10946.0 TRAV4*01 TRAJ33*01 CLVGDSNYQLIW TRBV10-1*01 TRBJ2-1*01 CASSGTNYGEQFF 2.254656e-12 2 3 5
global 692 48 6.778152e-08 4 0.004470 10946.0 TRAV4*01 TRAJ33*01 CLGGDSNYQLIW TRBV10-1*01 TRBJ2-1*01 CASSGTNYGEQFF 2.254656e-12 2 3 5
global 699 24 1.013257e-07 3 0.000530 1300.0 TRAV4*01 TRAJ33*01 CLVGDSNYQLIW TRBV10-1*01 TRBJ2-1*01 CASSGTNYGEQFF 8.443809e-09 2 3 5
global 35 96 8.392636e-06 4 0.014943 36770.0 TRAV1-2*01 TRAJ33*01 CAVRDSNYQLIW TRBV24-1*01 TRBJ2-6*01 CATSSDSSGASVLTF 6.455874e-07 1 4 8
global 698 48 1.147078e-05 3 0.002565 6292.0 TRAV4*01 TRAJ33*01 CLVGDSNYQLIW TRBV10-1*01 TRBJ2-1*01 CASSATNYGEQFF 1.665286e-09 2 1 5
global 697 72 3.917167e-05 4 0.021995 53857.0 TRAV4*01 TRAJ33*01 CLVGDSNYQLIW TRBV10-1*01 TRBJ2-1*01 CAIAGTNYGEQFF 1.092315e-12 2 3 5
global 38 96 3.962047e-05 5 0.065747 161778.0 TRAV1-2*01 TRAJ33*01 CAVRDSNYQLIW TRBV6-2*01 TRBJ2-6*01 CASSEAASGASVLTF 2.476280e-06 1 4 8
global 30 96 4.531022e-05 4 0.022814 55972.0 TRAV1-2*01 TRAJ33*01 CAVMDSNYQLIW TRBV6-3*01 TRBJ2-6*01 CASNMRHSGASVLTF 2.665307e-06 1 4 8
global 699 48 6.030807e-05 3 0.004462 10946.0 TRAV4*01 TRAJ33*01 CLVGDSNYQLIW TRBV10-1*01 TRBJ2-1*01 CASSGTNYGEQFF 8.443809e-09 2 3 5
global 696 72 1.350910e-04 4 0.030022 73511.0 TRAV4*01 TRAJ33*01 CLVGDSNYQLIW TRBV10-1*01 TRBJ2-1*01 CASSGTNYGEQFF 2.254656e-12 2 3 5
global 692 72 1.350910e-04 4 0.030022 73511.0 TRAV4*01 TRAJ33*01 CLGGDSNYQLIW TRBV10-1*01 TRBJ2-1*01 CASSGTNYGEQFF 2.254656e-12 2 3 5
global 27 96 9.163108e-04 4 0.048630 119075.0 TRAV1-2*01 TRAJ33*01 CAVMDSNYQLIW TRBV4-3*01 TRBJ2-6*01 CASSQGPGGASVLTF 4.363385e-05 1 4 8
global 26 72 1.032063e-03 3 0.011518 28202.0 TRAV1-2*01 TRAJ33*01 CAPRDSNYQLIW TRBV6-1*01 TRBJ2-6*01 CASSEGYSGASVLTF 4.370455e-13 1 4 8
global 28 96 3.820143e-03 3 0.017845 43780.0 TRAV1-2*01 TRAJ33*01 CAVMDSNYQLIW TRBV4-2*01 TRBJ2-6*01 CASSQVPGTGASVLTF 1.660932e-04 1 4 8
global 528 24 5.008615e-03 1 0.000001 3.0 TRAV27*01 TRAJ17*01 CAGEEVASNKLTF TRBV21-1*01 TRBJ1-5*01 CASSKGQGDQPQYF 2.086923e-04 5 3 4
global 698 72 5.386327e-03 3 0.020021 49119.0 TRAV4*01 TRAJ33*01 CLVGDSNYQLIW TRBV10-1*01 TRBJ2-1*01 CASSATNYGEQFF 1.665286e-09 2 1 5
global 28 72 1.117498e-02 2 0.002340 5741.0 TRAV1-2*01 TRAJ33*01 CAVMDSNYQLIW TRBV4-2*01 TRBJ2-6*01 CASSQVPGTGASVLTF 1.660932e-04 1 4 8
global 39 72 1.784486e-02 2 0.002958 7242.0 TRAV1-2*01 TRAJ33*01 CAVRDSNYQLIW TRBV10-1*01 TRBJ2-6*01 CASSDGGESGASVLTF 3.636078e-13 1 4 8
global 699 72 1.792119e-02 3 0.029963 73511.0 TRAV4*01 TRAJ33*01 CLVGDSNYQLIW TRBV10-1*01 TRBJ2-1*01 CASSGTNYGEQFF 8.443809e-09 2 3 5
global 620 24 2.170395e-02 1 0.000005 13.0 TRAV35*01 TRAJ58*01 CAGQRQTGGSRLTF TRBV14*01 TRBJ1-2*01 CASSQGYDYTF 7.234651e-04 6 0 4
global 621 24 2.170395e-02 1 0.000005 13.0 TRAV35*01 TRAJ58*01 CAGQRQTGGSRLTF TRBV14*01 TRBJ1-2*01 CASSQGYDYTF 7.234651e-04 6 0 4
global 697 96 2.312312e-02 4 0.110345 270189.0 TRAV4*01 TRAJ33*01 CLVGDSNYQLIW TRBV10-1*01 TRBJ2-1*01 CAIAGTNYGEQFF 1.092315e-12 2 3 5
global 529 24 4.173827e-02 1 0.000010 25.0 TRAV27*01 TRAJ17*01 CAGEGVASNKLTF TRBV21-1*01 TRBJ1-5*01 CASSSGQGDQPQYF 1.304321e-03 5 1 4
global 268 24 5.008587e-02 1 0.000012 30.0 TRAV17*01 TRAJ27*01 CATDANADKLTF TRBV19*01 TRBJ2-4*01 CASGQGGQNTQYF 1.517754e-03 4 3 2
global 269 24 6.678102e-02 1 0.000016 40.0 TRAV17*01 TRAJ27*01 CATDTNADKLTF TRBV19*01 TRBJ2-4*01 CASGQGGQNTQYF 1.964148e-03 4 3 2
global 31 96 8.341992e-02 2 0.006402 15676.0 TRAV1-2*01 TRAJ33*01 CAVRDQNYQLIW TRBV4-2*01 TRBJ2-6*01 CASSQDQEGTGASVLTF 2.372477e-03 1 4 8
global 692 96 8.972842e-02 4 0.156296 382703.0 TRAV4*01 TRAJ33*01 CLGGDSNYQLIW TRBV10-1*01 TRBJ2-1*01 CASSGTNYGEQFF 2.254656e-12 2 3 5
global 696 96 8.972842e-02 4 0.156296 382703.0 TRAV4*01 TRAJ33*01 CLVGDSNYQLIW TRBV10-1*01 TRBJ2-1*01 CASSGTNYGEQFF 2.254656e-12 2 3 5
global 70 24 9.015412e-02 1 0.000022 54.0 TRAV12-1*01 TRAJ36*01 CAVRTGVNNLFF TRBV23-1*01 TRBJ2-7*01 CASSQTGTGSYEQYF 2.372477e-03 3 6 0
global 27 72 9.566851e-02 2 0.006857 16790.0 TRAV1-2*01 TRAJ33*01 CAVMDSNYQLIW TRBV4-3*01 TRBJ2-6*01 CASSQGPGGASVLTF 4.363385e-05 1 4 8
global 71 24 1.068491e-01 1 0.000026 64.0 TRAV12-1*01 TRAJ36*01 CAVRTGVNNLFF TRBV23-1*01 TRBJ2-7*01 CASSSTGTGSYEQYF 2.671228e-03 3 6 0
global 37 96 1.629249e-01 3 0.063054 155152.0 TRAV1-2*01 TRAJ33*01 CAVRDSNYQLIW TRBV6-1*01 TRBJ2-6*01 CASSEARTGASVLTF 3.973778e-03 1 2 8
global 38 72 1.743967e-01 2 0.009266 22799.0 TRAV1-2*01 TRAJ33*01 CAVRDSNYQLIW TRBV6-2*01 TRBJ2-6*01 CASSEAASGASVLTF 2.476280e-06 1 4 8
global 29 96 2.466204e-01 2 0.011025 27048.0 TRAV1-2*01 TRAJ33*01 CAVMDSNYQLIW TRBV4-3*01 TRBJ2-6*01 CASSQDIGGSSGASVLTF 5.735359e-03 1 4 8
global 621 48 3.138613e-01 1 0.000077 188.0 TRAV35*01 TRAJ58*01 CAGQRQTGGSRLTF TRBV14*01 TRBJ1-2*01 CASSQGYDYTF 7.234651e-04 6 0 4
global 620 48 3.138613e-01 1 0.000077 188.0 TRAV35*01 TRAJ58*01 CAGQRQTGGSRLTF TRBV14*01 TRBJ1-2*01 CASSQGYDYTF 7.234651e-04 6 0 4
global 528 48 3.222084e-01 1 0.000079 193.0 TRAV27*01 TRAJ17*01 CAGEEVASNKLTF TRBV21-1*01 TRBJ1-5*01 CASSKGQGDQPQYF 2.086923e-04 5 3 4
global 33 96 4.254261e-01 2 0.014497 35671.0 TRAV1-2*01 TRAJ33*01 CAVRDSNYQLIW TRBV10-2*01 TRBJ2-6*01 CASSVDSDSGASVLTF 8.912654e-03 1 4 8
global 23 96 4.278074e-01 2 0.014537 35596.0 TRAV1-2*01 TRAJ33*01 CAAMDSNYQLIW TRBV4-2*01 TRBJ2-2*01 CASSQAMGEHGAQLFF 8.912654e-03 1 4 8
global 529 48 6.610841e-01 1 0.000162 396.0 TRAV27*01 TRAJ17*01 CAGEGVASNKLTF TRBV21-1*01 TRBJ1-5*01 CASSSGQGDQPQYF 1.304321e-03 5 1 4
global 268 48 8.580531e-01 1 0.000210 514.0 TRAV17*01 TRAJ27*01 CATDANADKLTF TRBV19*01 TRBJ2-4*01 CASGQGGQNTQYF 1.517754e-03 4 3 2

tcr_clumping_logos


This figure summarizes the results of a CoNGA analysis that produces scores (TCR clumping) and clusters. At the top are six 2D UMAP projections of clonotypes in the dataset based on GEX similarity (top left three panels) and TCR similarity (top right three panels), colored from left to right by GEX cluster assignment; TCR clumping score; joint GEX:TCR cluster assignment for clonotypes with significant TCR clumping scores, using a bicolored disk whose left half indicates GEX cluster and whose right half indicates TCR cluster; TCR cluster; TCR clumping; GEX:TCR cluster assignments for TCR clumping hits, as in the third panel.

Below are two rows of GEX landscape plots colored by (first row, left) expression of selected marker genes, (second row, left) Z-score normalized and GEX-neighborhood averaged expression of the same marker genes, and (both rows, right) TCR sequence features (see CoNGA manuscript Table S3 for TCR feature descriptions).

GEX and TCR sequence features of TCR clumping hits in clusters with 3 or more hits are summarized by a series of logo-style visualizations, from left to right: differentially expressed genes (DEGs); TCR sequence logos showing the V and J gene usage and CDR3 sequences for the TCR alpha and beta chains; biased TCR sequence scores, with red indicating elevated scores and blue indicating decreased scores relative to the rest of the dataset (see CoNGA manuscript Table S3 for score definitions); GEX 'logos' for each cluster consisting of a panel of marker genes shown with red disks colored by mean expression and sized according to the fraction of cells expressing the gene (gene names are given above).

DEG and TCRseq sequence logos are scaled by the adjusted P value of the associations, with full logo height requiring a top adjusted P value below 10-6. DEGs with fold-change less than 2 are shown in gray. Each cluster is indicated by a bicolored disk colored according to GEX cluster (left half) and TCR cluster (right half). The two numbers above each disk show the number of hits within the cluster (on the left) and the total number of cells in those clonotypes (on the right). The dendrogram at the left shows similarity relationships among the clusters based on connections in the GEX and TCR neighbor graphs.

The choice of which marker genes to use for the GEX umap panels and for the cluster GEX logos can be configured using run_conga.py command line flags or arguments to the conga.plotting.make_logo_plots function.
Image source: tmp_rhesus_tcr_clumping_logos.png

tcr_db_match


This table stores significant matches between TCRs in adata and TCRs in the file /scratch.global/ben_testing/conga/conga/data/new_paired_tcr_db_for_matching_nr.tsv

P values of matches are assigned by turning the raw TCRdist score into a P value based on a model of the V(D)J rearrangement process, so matches between TCRs that are very far from germline (for example) are assigned a higher significance.

Columns:

tcrdist: TCRdist distance between the two TCRs (adata query and db hit)

pvalue_adj: raw P value of the match * num query TCRs * num db TCRs

fdr_value: Benjamini-Hochberg FDR value for match

clone_index: index within adata of the query TCR clonotype

db_index: index of the hit in the database being matched

va,ja,cdr3a,vb,jb,cdr3b

db_XXX: where XXX is a field in the literature database



tcr_graph_vs_gex_features


This table has results from a graph-vs-features analysis in which we look for genes that are differentially expressed (elevated) in specific neighborhoods of the TCR neighbor graph. Differential expression is assessed by a ttest first, for speed, and then by a mannwhitneyu test for nbrhood/score combinations whose ttest P-value passes an initial threshold (default is 10* the pvalue threshold).

Each row of the table represents a single significant association, in other words a neighborhood (defined by the central clonotype index) and a gene.

The columns are as follows:

ttest_pvalue_adj= ttest_pvalue * number of comparisons mwu_pvalue_adj= mannwhitney-U P-value * number of comparisons log2enr = log2 fold change of gene in neighborhood (will be positive) gex_cluster= the consensus GEX cluster of the clonotypes w/ biased scores tcr_cluster= the consensus TCR cluster of the clonotypes w/ biased scores num_fg= the number of clonotypes in the neighborhood (including center) mean_fg= the mean value of the feature in the neighborhood mean_bg= the mean value of the feature outside the neighborhood feature= the name of the gene mait_fraction= the fraction of the skewed clonotypes that have an invariant TCR clone_index= the index in the anndata dataset of the clonotype that is the center of the neighborhood.


ttest_pvalue_adj mwu_pvalue_adj log2enr gex_cluster tcr_cluster feature mean_fg mean_bg num_fg clone_index mait_fraction nbr_frac graph_type feature_type
2.892767e-19 3.397924e-74 5.760394 2 6 ENSMMUG00000043894 2.432636 0.175342 71 -1 0.0 0.0 tcr_cluster gex
1.089014e-08 1.064123e-60 5.712788 0 10 ENSMMUG00000061119 1.753271 0.087108 36 -1 0.0 0.0 tcr_cluster gex
1.364521e-10 8.285322e-47 4.522135 2 6 ENSMMUG00000043894 1.703045 0.178513 103 415 0.0 0.1 tcr_nbr gex
7.391521e-09 5.327159e-41 4.325845 2 6 ENSMMUG00000043894 1.631611 0.186519 103 107 0.0 0.1 tcr_nbr gex
8.861353e-09 2.750179e-39 4.315207 2 6 ENSMMUG00000043894 1.627734 0.186954 103 626 0.0 0.1 tcr_nbr gex
1.066617e-08 6.673187e-39 4.273937 2 6 ENSMMUG00000043894 1.612688 0.188640 103 248 0.0 0.1 tcr_nbr gex
3.683860e-01 6.759251e-36 6.061486 0 1 ENSMMUG00000060662 0.797168 0.018091 103 924 0.0 0.1 tcr_nbr gex
5.602167e-08 2.577416e-35 4.175139 2 6 ENSMMUG00000043894 1.576652 0.192679 103 301 0.0 0.1 tcr_nbr gex
2.262062e-07 6.309623e-35 4.071092 2 6 ENSMMUG00000043894 1.538696 0.196933 103 59 0.0 0.1 tcr_nbr gex
1.756006e-07 1.146475e-34 4.027714 2 6 ENSMMUG00000043894 1.522876 0.198706 103 178 0.0 0.1 tcr_nbr gex
1.151267e-07 1.506979e-34 4.235354 2 6 ENSMMUG00000043894 1.598617 0.190217 103 785 0.0 0.1 tcr_nbr gex
1.537118e-07 8.457291e-34 4.123988 2 6 ENSMMUG00000043894 1.557992 0.194770 103 482 0.0 0.1 tcr_nbr gex
3.188034e-07 4.091844e-33 4.003964 2 6 ENSMMUG00000043894 1.514216 0.199677 103 479 0.0 0.1 tcr_nbr gex
3.532596e-07 4.468555e-33 4.001371 2 6 ENSMMUG00000043894 1.513271 0.199783 103 418 0.0 0.1 tcr_nbr gex
3.167149e-07 5.078867e-33 4.003478 2 6 ENSMMUG00000043894 1.514039 0.199697 103 432 0.0 0.1 tcr_nbr gex
6.089620e-07 7.975956e-33 3.970755 2 6 ENSMMUG00000043894 1.502111 0.201033 103 946 0.0 0.1 tcr_nbr gex
5.613666e-07 1.155395e-32 3.947235 2 6 ENSMMUG00000043894 1.493540 0.201994 103 101 0.0 0.1 tcr_nbr gex
8.191486e-01 1.204519e-32 5.719125 0 1 ENSMMUG00000060662 0.765785 0.021609 103 923 0.0 0.1 tcr_nbr gex
3.365246e-07 2.064042e-32 4.113990 2 6 ENSMMUG00000043894 1.554344 0.195179 103 758 0.0 0.1 tcr_nbr gex
3.440516e-06 1.180600e-30 3.804360 2 6 ENSMMUG00000043894 1.441542 0.207822 103 994 0.0 0.1 tcr_nbr gex
1.419424e+00 2.054940e-30 4.640636 2 4 ENSMMUG00000052673 0.505643 0.026039 103 544 0.0 0.1 tcr_nbr gex
1.711255e+00 2.961670e-30 4.503088 2 4 ENSMMUG00000052673 0.491586 0.027614 103 534 0.0 0.1 tcr_nbr gex
2.219197e-06 8.983608e-30 3.919719 2 6 ENSMMUG00000043894 1.483516 0.203118 103 158 0.0 0.1 tcr_nbr gex
1.759829e+00 1.224345e-29 5.504371 0 1 ENSMMUG00000060662 0.744081 0.024041 103 933 0.0 0.1 tcr_nbr gex
2.081256e+00 1.690525e-29 5.374644 0 1 ENSMMUG00000060662 0.730228 0.025594 103 928 0.0 0.1 tcr_nbr gex
3.165294e+00 1.985984e-29 5.238837 0 1 ENSMMUG00000060662 0.715144 0.027284 103 942 0.0 0.1 tcr_nbr gex
1.700937e-03 3.580677e-29 4.386767 2 2 ENSMMUG00000062085 1.077587 0.088580 103 504 0.0 0.1 tcr_nbr gex
3.839802e+00 3.827329e-29 5.002207 0 1 ENSMMUG00000060662 0.687489 0.030384 103 953 0.0 0.1 tcr_nbr gex
2.944461e-06 4.075794e-29 3.822363 2 6 ENSMMUG00000043894 1.448086 0.207088 103 511 0.0 0.1 tcr_nbr gex
1.894907e-01 5.332926e-29 4.193284 0 1 ENSMMUG00000057062 0.456475 0.031133 103 882 0.0 0.1 tcr_nbr gex
2.380103e-06 1.261190e-28 3.950724 2 6 ENSMMUG00000043894 1.494811 0.201852 103 241 0.0 0.1 tcr_nbr gex
6.128023e-02 2.962608e-28 4.347653 2 4 ENSMMUG00000052673 0.483071 0.030048 100 -1 0.0 0.0 tcr_cluster gex
3.388499e-06 3.743188e-28 3.867162 2 6 ENSMMUG00000043894 1.464382 0.205262 103 959 0.0 0.1 tcr_nbr gex
3.067582e+00 1.169482e-27 4.376205 2 4 ENSMMUG00000052673 0.478284 0.029105 103 564 0.0 0.1 tcr_nbr gex
3.784933e+00 1.452422e-27 4.398505 2 4 ENSMMUG00000052673 0.480644 0.028841 103 552 0.0 0.1 tcr_nbr gex
3.905234e+00 1.585636e-27 4.358971 2 4 ENSMMUG00000052673 0.476454 0.029310 103 556 0.0 0.1 tcr_nbr gex
3.802922e+00 1.623164e-27 4.365042 2 4 ENSMMUG00000052673 0.477099 0.029238 103 548 0.0 0.1 tcr_nbr gex
4.367138e+00 1.835152e-27 4.304220 2 4 ENSMMUG00000052673 0.470606 0.029966 103 533 0.0 0.1 tcr_nbr gex
4.236369e+00 2.186521e-27 4.200551 2 4 ENSMMUG00000052673 0.459394 0.031222 103 563 0.0 0.1 tcr_nbr gex
1.182252e-05 4.440368e-27 3.855867 2 6 ENSMMUG00000043894 1.460272 0.205723 103 46 0.0 0.1 tcr_nbr gex
7.825501e-06 4.814517e-27 3.864251 2 6 ENSMMUG00000043894 1.463323 0.205381 103 963 0.0 0.1 tcr_nbr gex
8.149115e-06 6.506373e-27 3.865820 2 6 ENSMMUG00000043894 1.463894 0.205317 103 949 0.0 0.1 tcr_nbr gex
8.783328e-06 8.852021e-27 3.831511 2 6 ENSMMUG00000043894 1.451413 0.206716 103 819 0.0 0.1 tcr_nbr gex
1.614262e-05 9.629497e-27 3.799316 2 6 ENSMMUG00000043894 1.439709 0.208027 103 185 0.0 0.1 tcr_nbr gex
5.078142e+00 1.343536e-26 5.122255 0 1 ENSMMUG00000060662 0.701731 0.028788 103 944 0.0 0.1 tcr_nbr gex
8.227295e+00 1.876362e-26 4.931847 0 1 ENSMMUG00000060662 0.678946 0.031341 103 941 0.0 0.1 tcr_nbr gex
2.051007e-05 4.690201e-26 3.660968 2 6 ENSMMUG00000043894 1.389521 0.213652 103 725 0.0 0.1 tcr_nbr gex
5.344732e+00 3.043984e-25 4.389552 2 4 ENSMMUG00000052673 0.479697 0.028947 103 530 0.0 0.1 tcr_nbr gex
5.919114e+00 3.446155e-25 4.359832 2 4 ENSMMUG00000052673 0.476545 0.029300 103 528 0.0 0.1 tcr_nbr gex
5.919114e+00 3.446155e-25 4.359832 2 4 ENSMMUG00000052673 0.476545 0.029300 103 529 0.0 0.1 tcr_nbr gex
Omitted 238 lines

tcr_graph_vs_gex_features_plot


This plot summarizes the results of a graph versus features analysis by labeling the clonotypes at the center of each biased neighborhood with the name of the feature biased in that neighborhood. The feature names are drawn in colored boxes whose color is determined by the strength and direction of the feature score bias (from bright red for features that are strongly elevated to bright blue for features that are strongly decreased in the corresponding neighborhoods, relative to the rest of the dataset).

At most one feature (the top scoring) is shown for each clonotype (ie, neighborhood). The UMAP xy coordinates for this plot are stored in adata.obsm['X_tcr_2d']. The score used for ranking correlations is 'mwu_pvalue_adj'. The threshold score for displaying a feature is 1.0. The feature column is 'feature'. Since we also run graph-vs-features using "neighbor" graphs that are defined by clusters, ie where each clonotype is connected to all the other clonotypes in the same cluster, some biased features may be associated with a cluster rather than a specific clonotype. Those features are labeled with a '*' at the end and shown near the centroid of the clonotypes belonging to that cluster.
Image source: tmp_rhesus_tcr_graph_vs_gex_features_plot.png

tcr_graph_vs_gex_features_panels


Graph-versus-feature analysis was used to identify a set of GEX features that showed biased distributions in TCR neighborhoods. This plot shows the distribution of the top-scoring GEX features on the TCR UMAP 2D landscape. The features are ranked by 'mwu_pvalue_adj' ie Mann-Whitney-Wilcoxon adjusted P value (raw P value * number of comparisons). At most 3 features from clonotype neighbhorhoods in each (GEX,TCR) cluster pair are shown. The raw scores for each feature are averaged over the K nearest neighbors (K is indicated in the lower right corner of each panel) for each clonotype. The min and max nbr-averaged scores are shown in the upper corners of each panel. Points are plotted in order of increasing feature score.
Image source: tmp_rhesus_tcr_graph_vs_gex_features_panels.png

tcr_genes_vs_gex_features


This table has results from a graph-vs-features analysis in which we look for genes that are differentially expressed (elevated) in specific neighborhoods of the TCR neighbor graph. Differential expression is assessed by a ttest first, for speed, and then by a mannwhitneyu test for nbrhood/score combinations whose ttest P-value passes an initial threshold (default is 10* the pvalue threshold).

Each row of the table represents a single significant association, in other words a neighborhood (defined by the central clonotype index) and a gene.

The columns are as follows:

ttest_pvalue_adj= ttest_pvalue * number of comparisons mwu_pvalue_adj= mannwhitney-U P-value * number of comparisons log2enr = log2 fold change of gene in neighborhood (will be positive) gex_cluster= the consensus GEX cluster of the clonotypes w/ biased scores tcr_cluster= the consensus TCR cluster of the clonotypes w/ biased scores num_fg= the number of clonotypes in the neighborhood (including center) mean_fg= the mean value of the feature in the neighborhood mean_bg= the mean value of the feature outside the neighborhood feature= the name of the gene mait_fraction= the fraction of the skewed clonotypes that have an invariant TCR clone_index= the index in the anndata dataset of the clonotype that is the center of the neighborhood.

In this analysis the TCR graph is defined by connecting all clonotypes that have the same VA/JA/VB/JB-gene segment (it's run four times, once with each gene segment type)
ttest_pvalue_adj mwu_pvalue_adj log2enr gex_cluster tcr_cluster feature mean_fg mean_bg num_fg clone_index mait_fraction gene_segment graph_type feature_type
7.680095e+00 1.487624e-218 NaN 6 5 ENSMMUG00000048246 3.234783 -1.873645e-09 6 -1 0.000000 TRBV6-8 tcr_genes gex
8.028887e-14 2.917151e-168 10.269545 0 1 ENSMMUG00000060662 2.728752 1.152940e-02 32 -1 0.000000 TRAV8-7 tcr_genes gex
2.437364e-01 1.368213e-163 12.038369 0 1 ENSMMUG00000059234 2.615368 3.008084e-03 9 -1 0.000000 TRBV25-1 tcr_genes gex
5.133844e-04 3.438427e-155 9.972508 6 1 ENSMMUG00000062897 2.403146 9.961364e-03 15 -1 0.000000 TRBV11-2 tcr_genes gex
1.048790e-13 1.937740e-144 9.777905 0 10 ENSMMUG00000063185 2.975444 2.096385e-02 31 -1 0.000000 TRBV4-2 tcr_genes gex
6.739993e-26 3.745618e-142 8.207480 0 3 ENSMMUG00000062085 2.643389 4.323633e-02 57 -1 0.000000 TRBV4-3 tcr_genes gex
2.737426e-13 2.783030e-130 9.048454 0 1 ENSMMUG00000062211 3.003197 3.552807e-02 33 -1 0.000000 TRBV12-2 tcr_genes gex
2.132582e-04 1.006423e-124 8.088103 2 4 ENSMMUG00000056431 1.916044 2.106862e-02 20 -1 0.000000 TRAV35 tcr_genes gex
4.244478e-06 1.347413e-106 6.957203 2 4 ENSMMUG00000052673 1.189393 1.822269e-02 49 -1 0.000000 TRAV27 tcr_genes gex
3.387262e-05 2.640779e-105 8.074643 1 1 ENSMMUG00000056910 1.911052 2.114114e-02 17 -1 0.000000 TRAV16 tcr_genes gex
2.744935e-40 1.802070e-101 6.715584 2 6 ENSMMUG00000043894 2.955590 1.598175e-01 63 -1 0.000000 TRBV20-1 tcr_genes gex
2.658563e-05 6.566327e-98 7.110737 0 2 ENSMMUG00000054409 1.627660 2.917654e-02 30 -1 0.000000 TRAV6 tcr_genes gex
4.502685e-08 2.741429e-91 7.830931 0 0 ENSMMUG00000065017 2.286982 3.811179e-02 23 -1 0.000000 TRAV12-1 tcr_genes gex
3.779473e-03 1.091229e-84 7.699737 0 4 ENSMMUG00000059325 1.992804 3.002155e-02 20 -1 0.000000 TRAV25 tcr_genes gex
9.277174e-18 1.995422e-84 6.285125 0 10 ENSMMUG00000061119 1.994806 7.828967e-02 36 -1 0.000000 TRAV18 tcr_genes gex
1.553028e-04 2.356232e-74 5.894151 0 1 ENSMMUG00000061081 0.880054 2.344868e-02 48 -1 0.000000 TRAV8-2 tcr_genes gex
2.053014e-05 3.280283e-72 5.772890 0 1 ENSMMUG00000057062 0.943915 2.830943e-02 51 -1 0.000000 TRAV8-3 tcr_genes gex
5.678434e-61 9.028388e-66 5.071222 0 3 ENSMMUG00000056515 2.987117 4.447157e-01 89 -1 0.000000 TRBV6-3 tcr_genes gex
4.119979e-17 3.785737e-61 7.109418 0 0 ENSMMUG00000051385 3.076480 1.395675e-01 29 -1 0.000000 TRBV7-4 tcr_genes gex
5.181498e-04 8.604369e-46 6.726140 2 2 ENSMMUG00000062974 2.156282 6.967033e-02 13 -1 0.000000 TRAV13-2 tcr_genes gex
1.173425e-15 5.751559e-44 5.448288 1 5 ENSMMUG00000043894 2.628155 2.579457e-01 32 -1 0.000000 TRBV19 tcr_genes gex
7.460720e-01 2.190617e-32 6.246444 0 5 ENSMMUG00000062211 2.302868 1.120602e-01 9 -1 0.000000 TRBV12-3 tcr_genes gex
3.877101e-03 1.211601e-27 2.900490 0 3 ENSMMUG00000061119 0.593705 1.030727e-01 89 -1 0.000000 TRAV19 tcr_genes gex
2.189019e-25 1.551478e-27 4.503213 0 1 ENSMMUG00000056515 2.908419 5.676314e-01 43 -1 0.100000 TRBV10-2 tcr_genes gex
3.244874e-15 8.271428e-23 4.200101 0 1 ENSMMUG00000056515 2.740876 5.816071e-01 40 -1 0.000000 TRBV6-2 tcr_genes gex
6.150524e-04 1.308301e-21 6.151809 0 1 ENSMMUG00000051385 2.778609 1.925398e-01 12 -1 0.000000 TRBV7-6 tcr_genes gex
1.683511e+00 1.945236e-19 4.861590 2 5 ENSMMUG00000051385 1.998051 1.982498e-01 14 -1 0.000000 TRBV5-6 tcr_genes gex
1.120149e-01 2.156665e-15 4.792311 1 1 ENSMMUG00000043894 2.362086 2.978226e-01 17 -1 0.000000 TRBV21-1 tcr_genes gex
4.998835e-01 1.576433e-12 3.767046 4 8 KLRB1 1.498821 2.274150e-01 32 -1 1.000000 TRAV1-2 tcr_genes gex
1.900278e-01 8.629206e-06 2.087580 0 3 ENSMMUG00000056515 1.546471 6.255701e-01 45 -1 0.000000 TRBV9 tcr_genes gex
6.111556e-01 4.133253e-04 2.348992 0 4 ENSMMUG00000056515 1.711141 6.366813e-01 28 -1 0.000000 TRBV10-1 tcr_genes gex
5.625136e+00 1.952271e-02 2.281442 4 8 IL7R 1.939922 8.000069e-01 32 -1 1.000000 TRAV1-2 tcr_genes gex
6.899273e+00 3.391942e-02 1.825299 3 5 PPDPF 1.242306 5.277901e-01 29 -1 0.000000 TRBV7-4 tcr_genes gex
3.581038e-01 2.756213e-01 0.929537 0 1 TIGAR 1.490060 1.031299e+00 97 -1 0.041667 TRBJ1-2 tcr_genes gex
2.382486e-03 4.042441e-01 0.416653 1 4 RPL35A 3.630398 3.350432e+00 49 -1 0.000000 TRAV27 tcr_genes gex
5.202479e-01 1.337438e+00 0.426685 1 6 RPL8 3.215522 2.933484e+00 63 -1 0.000000 TRBV20-1 tcr_genes gex
6.754571e-01 4.552175e+00 2.381771 2 2 ENSMMUG00000006206 1.457456 4.899458e-01 8 -1 0.000000 TRAJ3 tcr_genes gex

tcr_genes_vs_gex_features_panels


Graph-versus-feature analysis was used to identify a set of GEX features that showed biased distributions in TCR neighborhoods. This plot shows the distribution of the top-scoring GEX features on the TCR UMAP 2D landscape. The features are ranked by 'mwu_pvalue_adj' ie Mann-Whitney-Wilcoxon adjusted P value (raw P value * number of comparisons). At most 3 features from clonotype neighbhorhoods in each (GEX,TCR) cluster pair are shown. The raw scores for each feature are averaged over the K nearest neighbors (K is indicated in the lower right corner of each panel) for each clonotype. The min and max nbr-averaged scores are shown in the upper corners of each panel. Points are plotted in order of increasing feature score.
Image source: tmp_rhesus_tcr_genes_vs_gex_features_panels.png

gex_graph_vs_tcr_features


This table has results from a graph-vs-features analysis in which we look at the distribution of a set of TCR-defined features over the GEX neighbor graph. We look for neighborhoods in the graph that have biased score distributions, as assessed by a ttest first, for speed, and then by a mannwhitneyu test for nbrhood/score combinations whose ttest P-value passes an initial threshold (default is 10* the pvalue threshold).

Each row of the table represents a single significant association, in other words a neighborhood (defined by the central clonotype index) and a tcr feature.

The columns are as follows:

ttest_pvalue_adj= ttest_pvalue * number of comparisons ttest_stat= ttest statistic (sign indicates where feature is up or down) mwu_pvalue_adj= mannwhitney-U P-value * number of comparisons gex_cluster= the consensus GEX cluster of the clonotypes w/ biased scores tcr_cluster= the consensus TCR cluster of the clonotypes w/ biased scores num_fg= the number of clonotypes in the neighborhood (including center) mean_fg= the mean value of the feature in the neighborhood mean_bg= the mean value of the feature outside the neighborhood feature= the name of the TCR score mait_fraction= the fraction of the skewed clonotypes that have an invariant TCR clone_index= the index in the anndata dataset of the clonotype that is the center of the neighborhood.


ttest_pvalue_adj ttest_stat mwu_pvalue_adj gex_cluster tcr_cluster num_fg mean_fg mean_bg feature mait_fraction clone_index nbr_frac graph_type feature_type
0.002003 5.266544 8.875816e-54 4 8 69 0.289855 0.002099 mait 1.000000 -1 0.00 gex_cluster tcr
0.000840 5.486600 1.298149e-42 4 8 69 0.318841 0.010493 TRAV1-2 0.882353 -1 0.00 gex_cluster tcr
0.547702 4.922232 7.170575e-32 4 8 103 0.194175 0.002176 mait 0.800000 24 0.10 gex_nbr tcr
0.388360 5.001786 7.130560e-24 4 8 103 0.213592 0.010881 TRAV1-2 0.800000 24 0.10 gex_nbr tcr
5.272281 4.353944 6.279203e-21 4 8 103 0.165049 0.005441 mait 0.680000 25 0.10 gex_nbr tcr
5.272281 4.353944 6.279203e-21 4 8 103 0.165049 0.005441 mait 0.680000 41 0.10 gex_nbr tcr
0.064964 4.327631 5.109696e-19 4 8 69 0.246377 0.020986 TRBJ2-6 0.705882 -1 0.00 gex_cluster tcr
3.834780 4.433647 8.530963e-16 4 8 103 0.184466 0.014146 TRAV1-2 0.680000 41 0.10 gex_nbr tcr
3.206622 4.478261 2.470043e-14 4 8 103 0.194175 0.018498 TRBJ2-6 0.480000 24 0.10 gex_nbr tcr
8.110873 4.238550 2.078654e-13 4 8 103 0.174757 0.015234 TRAV1-2 0.640000 23 0.10 gex_nbr tcr
0.059479 4.348959 2.510116e-13 4 8 69 0.275362 0.039874 TRAJ33 1.000000 -1 0.00 gex_cluster tcr
5.827015 2.882367 1.035680e-03 3 3 158 0.069620 0.010417 TRAV17 0.000000 -1 0.00 gex_cluster tcr
0.000862 5.026611 2.711672e-03 0 3 297 0.210607 0.055467 cd8 0.013514 -1 0.00 gex_cluster tcr
0.003944 5.010718 7.681031e-03 4 1 69 78.420290 65.929696 alphadist 0.058824 -1 0.00 gex_cluster tcr
0.064001 -4.129941 5.010006e-02 2 1 183 -0.033109 0.129706 cd8 0.000000 -1 0.00 gex_cluster tcr
0.003420 -4.786657 9.405589e-02 1 6 184 -0.031455 0.129537 cd8 0.000000 -1 0.00 gex_cluster tcr
0.375490 -3.802146 1.662729e-01 4 8 69 -0.119680 0.436249 af5 0.764706 -1 0.00 gex_cluster tcr
0.000731 -5.037161 1.812127e-01 0 5 297 0.016835 0.080000 TRBV20-1 0.000000 -1 0.00 gex_cluster tcr
0.277927 -3.889917 2.232103e-01 4 8 69 -0.695780 0.037686 af3 0.647059 -1 0.00 gex_cluster tcr
0.311807 -3.862978 3.091960e-01 4 8 69 -0.246574 0.151507 kf7 0.705882 -1 0.00 gex_cluster tcr
0.008690 -5.760626 3.366695e-01 1 6 103 -0.113480 0.124540 cd8 0.000000 593 0.10 gex_nbr tcr
6.534299 2.835147 3.637232e-01 2 6 183 0.120219 0.048868 TRBV20-1 0.000000 -1 0.00 gex_cluster tcr
0.285774 -5.022732 3.672394e-01 2 6 103 -0.120508 0.125328 cd8 0.000000 163 0.10 gex_nbr tcr
0.616313 4.835466 5.733172e-01 0 1 103 0.299643 0.078238 cd8 0.000000 20 0.10 gex_nbr tcr
9.197958 -6.563406 5.917695e-01 4 8 11 -0.939361 0.136207 kf7 1.000000 27 0.01 gex_nbr tcr
0.231033 -5.069156 6.242221e-01 2 6 103 -0.118154 0.125064 cd8 0.000000 1010 0.10 gex_nbr tcr
3.020025 3.064586 7.254659e-01 0 3 297 0.134680 0.067586 TRAV19 0.000000 -1 0.00 gex_cluster tcr
0.145216 -5.166565 8.317266e-01 2 6 103 -0.109760 0.124123 cd8 0.000000 545 0.10 gex_nbr tcr
0.856814 -3.563955 5.661843e+00 4 8 69 0.543890 0.582572 nndists_tcr 0.941176 -1 0.00 gex_cluster tcr
0.601287 -4.829535 7.050044e+00 1 6 103 -0.081544 0.120961 cd8 0.000000 432 0.10 gex_nbr tcr
0.276291 -5.017002 7.982202e+00 1 6 103 -0.099078 0.122926 cd8 0.000000 129 0.10 gex_nbr tcr
0.272385 -5.009599 8.364139e+00 1 0 103 -0.085437 0.121397 cd8 0.000000 680 0.10 gex_nbr tcr

gex_graph_vs_tcr_features_plot


This plot summarizes the results of a graph versus features analysis by labeling the clonotypes at the center of each biased neighborhood with the name of the feature biased in that neighborhood. The feature names are drawn in colored boxes whose color is determined by the strength and direction of the feature score bias (from bright red for features that are strongly elevated to bright blue for features that are strongly decreased in the corresponding neighborhoods, relative to the rest of the dataset).

At most one feature (the top scoring) is shown for each clonotype (ie, neighborhood). The UMAP xy coordinates for this plot are stored in adata.obsm['X_gex_2d']. The score used for ranking correlations is 'mwu_pvalue_adj'. The threshold score for displaying a feature is 1.0. The feature column is 'feature'. Since we also run graph-vs-features using "neighbor" graphs that are defined by clusters, ie where each clonotype is connected to all the other clonotypes in the same cluster, some biased features may be associated with a cluster rather than a specific clonotype. Those features are labeled with a '*' at the end and shown near the centroid of the clonotypes belonging to that cluster.
Image source: tmp_rhesus_gex_graph_vs_tcr_features_plot.png

gex_graph_vs_tcr_features_panels


Graph-versus-feature analysis was used to identify a set of TCR features that showed biased distributions in GEX neighborhoods. This plot shows the distribution of the top-scoring TCR features on the GEX UMAP 2D landscape. The features are ranked by 'mwu_pvalue_adj' ie Mann-Whitney-Wilcoxon adjusted P value (raw P value * number of comparisons). At most 3 features from clonotype neighbhorhoods in each (GEX,TCR) cluster pair are shown. The raw scores for each feature are averaged over the K nearest neighbors (K is indicated in the lower right corner of each panel) for each clonotype. The min and max nbr-averaged scores are shown in the upper corners of each panel. Points are plotted in order of increasing feature score.
Image source: tmp_rhesus_gex_graph_vs_tcr_features_panels.png

graph_vs_features_gex_clustermap


This plot shows the distribution of significant features from graph-vs-features or HotSpot analysis plotted across the GEX landscape. Rows are features and columns are individual clonotypes. Columns are ordered by hierarchical clustering (if a dendrogram is present above the heatmap) or by a 1D UMAP projection (used for very large datasets or if 'X_pca_gex' is not present in adata.obsm_keys()). Rows are ordered by hierarchical clustering with a correlation metric.

The row colors to the left of the heatmap show the feature type (blue=TCR, orange=GEX). The row colors to the left of those indicate the strength of the graph-vs-feature correlation (also included in the feature labels to the right of the heatmap; keep in mind that highly significant P values for some features may shift the colorscale so everything else looks dark blue).

The column colors above the heatmap are GEX clusters (and TCR V/J genes if plotting against the TCR landscape). The text above the column colors provides more info.

Feature scores are Z-score normalized and then averaged over the K=102 nearest neighbors (0 means no nbr-averaging).

The 'coolwarm' colormap is centered at Z=0.

Since features of the same type (GEX or TCR) as the landscape and neighbor graph (ie GEX features) are more highly correlated over graph neighborhoods, their neighbor-averaged scores will show more extreme variation. For this reason, the nbr-averaged scores for these features from the same modality as the landscape itself are downscaled by a factor of rescale_factor_for_self_features=0.33.

The colormap in the top left is for the Z-score normalized, neighbor-averaged scores (multiply by 3.03 to get the color scores for the GEX features).


Image source: tmp_rhesus_graph_vs_features_gex_clustermap.png

graph_vs_features_tcr_clustermap


This plot shows the distribution of significant features from graph-vs-features or HotSpot analysis plotted across the TCR landscape. Rows are features and columns are individual clonotypes. Columns are ordered by hierarchical clustering (if a dendrogram is present above the heatmap) or by a 1D UMAP projection (used for very large datasets or if 'X_pca_tcr' is not present in adata.obsm_keys()). Rows are ordered by hierarchical clustering with a correlation metric.

The row colors to the left of the heatmap show the feature type (blue=TCR, orange=GEX). The row colors to the left of those indicate the strength of the graph-vs-feature correlation (also included in the feature labels to the right of the heatmap; keep in mind that highly significant P values for some features may shift the colorscale so everything else looks dark blue).

The column colors above the heatmap are TCR clusters (and TCR V/J genes if plotting against the TCR landscape). The text above the column colors provides more info.

Feature scores are Z-score normalized and then averaged over the K=102 nearest neighbors (0 means no nbr-averaging).

The 'coolwarm' colormap is centered at Z=0.

Since features of the same type (GEX or TCR) as the landscape and neighbor graph (ie TCR features) are more highly correlated over graph neighborhoods, their neighbor-averaged scores will show more extreme variation. For this reason, the nbr-averaged scores for these features from the same modality as the landscape itself are downscaled by a factor of rescale_factor_for_self_features=0.33.

The colormap in the top left is for the Z-score normalized, neighbor-averaged scores (multiply by 3.03 to get the color scores for the TCR features).


Image source: tmp_rhesus_graph_vs_features_tcr_clustermap.png

graph_vs_summary


Summary figure for the graph-vs-graph and graph-vs-features analyses.
Image source: tmp_rhesus_graph_vs_summary.png

hotspot_features


Find GEX (TCR) features that show a biased distribution across the TCR (GEX) neighbor graph, using a simplified version of the Hotspot method from the Yosef lab.

DeTomaso, D., & Yosef, N. (2021). "Hotspot identifies informative gene modules across modalities of single-cell genomics." Cell Systems, 12(5), 446–456.e9.

PMID:33951459

Columns:

Z: HotSpot Z statistic

pvalue_adj: Raw P value times the number of tests (crude Bonferroni correction)

nbr_frac: The K NN nbr fraction used for the neighbor graph construction (nbr_frac = 0.1 means K=0.1*num_clonotypes neighbors)


Z pvalue_adj feature feature_type nbr_frac
63.643690 0.000000e+00 ENSMMUG00000056515 gex 0.10
59.411769 0.000000e+00 ENSMMUG00000043894 gex 0.10
41.269847 0.000000e+00 ENSMMUG00000061119 gex 0.10
40.066080 0.000000e+00 ENSMMUG00000062085 gex 0.10
39.953858 0.000000e+00 ENSMMUG00000060662 gex 0.10
39.472488 0.000000e+00 ENSMMUG00000060662 gex 0.01
38.206218 0.000000e+00 ENSMMUG00000052673 gex 0.10
35.967220 1.946855e-279 ENSMMUG00000043894 gex 0.01
35.168258 4.370481e-267 ENSMMUG00000056515 gex 0.01
33.336175 8.259764e-240 ENSMMUG00000057062 gex 0.10
29.458605 6.982233e-187 ENSMMUG00000061119 gex 0.01
26.095667 2.918464e-146 ENSMMUG00000061081 gex 0.10
25.196685 3.113142e-136 ENSMMUG00000062085 gex 0.01
25.093568 4.178786e-135 ENSMMUG00000062211 gex 0.10
24.363803 3.601598e-129 mait tcr 0.10
24.372706 2.381113e-127 ENSMMUG00000057062 gex 0.01
23.428392 1.569420e-117 ENSMMUG00000054409 gex 0.10
22.289899 4.028425e-108 mait tcr 0.01
21.784105 2.346674e-101 ENSMMUG00000052673 gex 0.01
21.616693 8.944171e-100 ENSMMUG00000065017 gex 0.10
20.803983 2.847160e-92 ENSMMUG00000054409 gex 0.01
20.422579 7.530645e-89 CEBPD gex 0.01
19.385046 7.377658e-80 gex_cluster4 gex 0.01
18.691667 4.137002e-74 ENSMMUG00000056431 gex 0.10
17.915819 6.344050e-68 ENSMMUG00000065017 gex 0.01
17.755709 1.112891e-66 ENSMMUG00000056431 gex 0.01
17.437876 3.042184e-64 ENSMMUG00000061081 gex 0.01
16.447369 7.620416e-59 TRAV1-2 tcr 0.10
16.591287 5.759498e-58 ENSMMUG00000063185 gex 0.10
15.681633 1.757348e-53 TRAV1-2 tcr 0.01
15.085948 1.432285e-47 ENSMMUG00000062211 gex 0.01
14.305926 1.436347e-42 ENSMMUG00000051385 gex 0.10
13.955132 2.092637e-40 ENSMMUG00000059325 gex 0.10
13.575538 4.866011e-40 cd8 tcr 0.10
13.766224 2.908768e-39 CEBPD gex 0.10
13.386066 5.213968e-37 gex_cluster4 gex 0.10
12.648722 8.127933e-33 ENSMMUG00000003532 gex 0.10
12.609744 1.333800e-32 ARHGAP8 gex 0.01
11.461007 1.480295e-26 ENSMMUG00000063185 gex 0.01
11.332531 6.472525e-26 ENSMMUG00000059325 gex 0.01
10.567272 3.020817e-22 gex_cluster0 gex 0.10
10.134396 2.778910e-20 ENSMMUG00000051385 gex 0.01
9.463995 2.578652e-19 TRBJ2-6 tcr 0.10
9.711782 1.919955e-18 KLRB1 gex 0.01
9.238144 2.182051e-18 tcr_cluster8 tcr 0.10
8.694954 3.018653e-16 tcr_cluster6 tcr 0.10
8.612584 6.214564e-16 tcr_cluster8 tcr 0.01
8.903537 3.866700e-15 ENSMMUG00000056910 gex 0.10
8.841629 6.742919e-15 KLRG1 gex 0.01
8.226785 1.673060e-14 TRBV20-1 tcr 0.10
Omitted 79 lines

hotspot_gex_umap


HotSpot analysis (Nir Yosef lab, PMID: 33951459) was used to identify a set of GEX (TCR) features that showed biased distributions in TCR (GEX) space. This plot shows the distribution of the top-scoring HotSpot features on the GEX UMAP 2D landscape. The features are ranked by adjusted P value (raw P value * number of comparisons). The raw scores for each feature are averaged over the K nearest neighbors (K is indicated in the lower right corner of each panel) for each clonotype. The min and max nbr-averaged scores are shown in the upper corners of each panel.

Features are filtered based on correlation coefficient to reduce redundancy: if a feature has a correlation of >= 0.9 (the max_feature_correlation argument to conga.plotting.plot_hotspot_umap) to a previously plotted feature, that feature is skipped. Points are plotted in order of increasing feature score
Image source: tmp_rhesus_hotspot_combo_features_0.100_nbrs_gex_plot_umap_nbr_avg.png

hotspot_gex_clustermap


This plot shows the distribution of significant features from graph-vs-features or HotSpot analysis plotted across the GEX landscape. Rows are features and columns are individual clonotypes. Columns are ordered by hierarchical clustering (if a dendrogram is present above the heatmap) or by a 1D UMAP projection (used for very large datasets or if 'X_pca_gex' is not present in adata.obsm_keys()). Rows are ordered by hierarchical clustering with a correlation metric.

The row colors to the left of the heatmap show the feature type (blue=TCR, orange=GEX). The row colors to the left of those indicate the strength of the graph-vs-feature correlation (also included in the feature labels to the right of the heatmap; keep in mind that highly significant P values for some features may shift the colorscale so everything else looks dark blue).

The column colors above the heatmap are GEX clusters (and TCR V/J genes if plotting against the TCR landscape). The text above the column colors provides more info.

Feature scores are Z-score normalized and then averaged over the K=102 nearest neighbors (0 means no nbr-averaging).

The 'coolwarm' colormap is centered at Z=0.

Since features of the same type (GEX or TCR) as the landscape and neighbor graph (ie GEX features) are more highly correlated over graph neighborhoods, their neighbor-averaged scores will show more extreme variation. For this reason, the nbr-averaged scores for these features from the same modality as the landscape itself are downscaled by a factor of rescale_factor_for_self_features=0.33.

The colormap in the top left is for the Z-score normalized, neighbor-averaged scores (multiply by 3.03 to get the color scores for the GEX features).


Image source: tmp_rhesus_hotspot_combo_features_0.100_nbrs_gex_plot_clustermap_nbr_avg.png

hotspot_tcr_umap


HotSpot analysis (Nir Yosef lab, PMID: 33951459) was used to identify a set of GEX (TCR) features that showed biased distributions in TCR (GEX) space. This plot shows the distribution of the top-scoring HotSpot features on the TCR UMAP 2D landscape. The features are ranked by adjusted P value (raw P value * number of comparisons). The raw scores for each feature are averaged over the K nearest neighbors (K is indicated in the lower right corner of each panel) for each clonotype. The min and max nbr-averaged scores are shown in the upper corners of each panel.

Features are filtered based on correlation coefficient to reduce redundancy: if a feature has a correlation of >= 0.9 (the max_feature_correlation argument to conga.plotting.plot_hotspot_umap) to a previously plotted feature, that feature is skipped. Points are plotted in order of increasing feature score
Image source: tmp_rhesus_hotspot_combo_features_0.100_nbrs_tcr_plot_umap_nbr_avg.png

hotspot_tcr_clustermap


This plot shows the distribution of significant features from graph-vs-features or HotSpot analysis plotted across the TCR landscape. Rows are features and columns are individual clonotypes. Columns are ordered by hierarchical clustering (if a dendrogram is present above the heatmap) or by a 1D UMAP projection (used for very large datasets or if 'X_pca_tcr' is not present in adata.obsm_keys()). Rows are ordered by hierarchical clustering with a correlation metric.

The row colors to the left of the heatmap show the feature type (blue=TCR, orange=GEX). The row colors to the left of those indicate the strength of the graph-vs-feature correlation (also included in the feature labels to the right of the heatmap; keep in mind that highly significant P values for some features may shift the colorscale so everything else looks dark blue).

The column colors above the heatmap are TCR clusters (and TCR V/J genes if plotting against the TCR landscape). The text above the column colors provides more info.

Feature scores are Z-score normalized and then averaged over the K=102 nearest neighbors (0 means no nbr-averaging).

The 'coolwarm' colormap is centered at Z=0.

Since features of the same type (GEX or TCR) as the landscape and neighbor graph (ie TCR features) are more highly correlated over graph neighborhoods, their neighbor-averaged scores will show more extreme variation. For this reason, the nbr-averaged scores for these features from the same modality as the landscape itself are downscaled by a factor of rescale_factor_for_self_features=0.33.

The colormap in the top left is for the Z-score normalized, neighbor-averaged scores (multiply by 3.03 to get the color scores for the TCR features).


Image source: tmp_rhesus_hotspot_combo_features_0.100_nbrs_tcr_plot_clustermap_nbr_avg.png